Regret optimality in semi-Markov decision processes with an absorbing set
نویسندگان
چکیده
The optimization problem of general utility case is considered for countable state semi-Markov decision processes. The regret-utility function is introduced as a function of two variables, one is a target value and the other is a present value. We consider the expectation of the regret-utility function incured until the reaching time to a given absorbing set. In order to characterize the regret optimal policy, we derive the optimality equation and then prove the uniqueness of solution. As application, two examples of regret-utility functions are used to illustrate the analysis for these models. Keywards: Regret optimal policy, Semi-Markov decision processes, General regret-utility, Optimality equation.
منابع مشابه
Denumerable Undiscounted Semi-Markov Decision Processes with Unbounded Rewards
This paper establishes the existence of a solution to the optimality equations in undiscounted semi-Markov decision models with countable state space, under conditions generalizing the hitherto obtained results. In particular, we merely require the existence of a finite set of states in which every pair of states can reach each other via some stationary policy, instead of the traditional and re...
متن کاملSemi-markov Decision Processes
Considered are infinite horizon semi-Markov decision processes (SMDPs) with finite state and action spaces. Total expected discounted reward and long-run average expected reward optimality criteria are reviewed. Solution methodology for each criterion is given, constraints and variance sensitivity are also discussed.
متن کاملExploration-Exploitation in MDPs with Options
While a large body of empirical results show that temporally-extended actions and options may significantly affect the learning performance of an agent, the theoretical understanding of how and when options can be beneficial in online reinforcement learning is relatively limited. In this paper, we derive an upper and lower bound on the regret of a variant of UCRL using options. While we first a...
متن کاملOn Minimizing Ordered Weighted Regrets in Multiobjective Markov Decision Processes
In this paper, we propose an exact solution method to generate fair policies in Multiobjective Markov Decision Processes (MMDPs). MMDPs consider n immediate reward functions, representing either individual payoffs in a multiagent problem or rewards with respect to different objectives. In this context, we focus on the determination of a policy that fairly shares regrets among agents or objectiv...
متن کاملOptimal Threshold Probability and Policy Iteration in Semi-markov Decision Processes
We consider undiscounted semi-Markov decision process with a target set and our main concern is a problem minimizing threshold probability. We formulate the problem as an infinite horizon case with a recurrent class. We show that an optimal value function is a unique solution to an optimality equation and there exists a stationary optimal policy. Also several value iteration methods and a polic...
متن کامل